1 Executive summary

TODO - write executive summary

2 Libraries

Libraries used to prepare the report.

library(knitr)
library(dplyr)
library(R.utils)
library(data.table)
library(tools)
library(stringr)
library(ggplot2)
library(plotly)
library(tidyr)
library(scales)

3 Data loading

Code to load datasets from compressed files stored in a specified folder.

folder_name = "data"
csv_files <- list.files("data",
                        pattern = "\\.csv.gz$",
                        full.names = FALSE)
files_names <- file_path_sans_ext(csv_files, compression = TRUE)

for (file_name in files_names) {
  assign(paste0(file_name, "_df"),
         fread(file.path(
           folder_name,
           paste0(file_name, ".csv.gz")
         )))
}

csv_files
##  [1] "colors.csv.gz"             "elements.csv.gz"          
##  [3] "inventories.csv.gz"        "inventory_minifigs.csv.gz"
##  [5] "inventory_parts.csv.gz"    "inventory_sets.csv.gz"    
##  [7] "minifigs.csv.gz"           "part_categories.csv.gz"   
##  [9] "part_relationships.csv.gz" "parts.csv.gz"             
## [11] "sets.csv.gz"               "themes.csv.gz"

4 Data intoduction

The section below presents the datasets used for analysis, including their structure, dimensions, and basic statistics.

Total dataset size
Rows_count Columns_count Values_count NA_percentage
1446639 45 8099232 0.29%

4.1 Data structure

4.2 Datasets summaries

4.2.1 Colors

Colors dataset dimensions
Rows Columns
263 4
Colors dataset basic statistics
id name rgb is_trans
Min. : -1.0 Length:263 Length:263 Length:263
1st Qu.: 83.0 Class :character Class :character Class :character
Median :1005.0 Mode :character Mode :character Mode :character
Mean : 651.4
3rd Qu.:1070.5
Max. :9999.0
Head of Colors dataset
id name rgb is_trans
-1 [Unknown] 0033B2 f
0 Black 05131D f
1 Blue 0055BF f
2 Green 237841 f
3 Dark Turquoise 008F9B f
4 Red C91A09 f

4.2.2 Elements

Elements dataset dimensions
Rows Columns
84138 4
Elements dataset basic statistics
element_id part_num color_id design_id
Min. : 9327 Length:84138 Min. : -1.0 Min. : 1001
1st Qu.: 4259774 Class :character 1st Qu.: 8.0 1st Qu.: 18454
Median : 6057754 Mode :character Median : 28.0 Median : 41748
Mean : 5222065 Mean : 539.7 Mean : 45570
3rd Qu.: 6262024 3rd Qu.: 135.0 3rd Qu.: 75474
Max. :61532443 Max. :9999.0 Max. :107520
NA’s :23682
Head of Elements dataset
element_id part_num color_id design_id
6443403 2277c01pr0009 1 2277
6300211 67906c01 14 67908
4566309 2564 0 2564
4275423 53657 1004 53657
6194308 92926 71 28967
6229123 26561 4 26561

4.2.3 Inventories

Inventories dataset dimensions
Rows Columns
37265 3
Inventories dataset basic statistics
id version set_num
Min. : 1 Min. : 1.000 Length:37265
1st Qu.: 14424 1st Qu.: 1.000 Class :character
Median : 54379 Median : 1.000 Mode :character
Mean : 61104 Mean : 1.091
3rd Qu.: 88842 3rd Qu.: 1.000
Max. :194312 Max. :16.000
Head of Inventories dataset
id version set_num
1 1 7922-1
3 1 3931-1
4 1 6942-1
15 1 5158-1
16 1 903-1
17 1 850950-1

4.2.4 Inventory minifigs

Inventory minifigs dataset dimensions
Rows Columns
20858 3
Inventory minifigs dataset basic statistics
inventory_id fig_num quantity
Min. : 3 Length:20858 Min. : 1.000
1st Qu.: 7869 Class :character 1st Qu.: 1.000
Median : 15681 Mode :character Median : 1.000
Mean : 43010 Mean : 1.062
3rd Qu.: 66834 3rd Qu.: 1.000
Max. :194312 Max. :100.000
Head of Inventory minifigs dataset
inventory_id fig_num quantity
3 fig-001549 1
4 fig-000764 1
19 fig-000555 1
25 fig-000574 1
26 fig-000842 1
26 fig-008641 1

4.2.5 Inventory parts

Inventory parts dataset dimensions
Rows Columns
1180987 6
Inventory parts dataset basic statistics
inventory_id part_num color_id quantity is_spare img_url
Min. : 1 Length:1180987 Min. : -1.0 Min. : 1.00 Length:1180987 Length:1180987
1st Qu.: 9404 Class :character 1st Qu.: 4.0 1st Qu.: 1.00 Class :character Class :character
Median : 22838 Mode :character Median : 15.0 Median : 2.00 Mode :character Mode :character
Mean : 50849 Mean : 131.8 Mean : 3.37
3rd Qu.: 87088 3rd Qu.: 71.0 3rd Qu.: 4.00
Max. :194312 Max. :9999.0 Max. :3064.00
Head of Inventory parts dataset
inventory_id part_num color_id quantity is_spare img_url
1 48379c01 72 1 f https://cdn.rebrickable.com/media/parts/photos/1/48379c01-1-e7daa845-2671-4737-8642-3b1574308155.jpg
1 48395 7 1 f https://cdn.rebrickable.com/media/parts/photos/7/48395-7-b9152acf-2fa5-4836-a04d-5b7fd39c2406.jpg
1 stickerupn0077 9999 1 f
1 upn0342 0 1 f
1 upn0350 25 1 f
3 2343 47 1 f https://cdn.rebrickable.com/media/parts/elements/3000240.jpg

4.2.6 Inventory sets

Inventory sets dataset dimensions
Rows Columns
4358 3
Inventory sets dataset basic statistics
inventory_id set_num quantity
Min. : 35 Length:4358 Min. : 1.000
1st Qu.: 8076 Class :character 1st Qu.: 1.000
Median : 16423 Mode :character Median : 1.000
Mean : 52519 Mean : 1.813
3rd Qu.: 98685 3rd Qu.: 1.000
Max. :191576 Max. :60.000
Head of Inventory sets dataset
inventory_id set_num quantity
35 75911-1 1
35 75912-1 1
39 75048-1 1
39 75053-1 1
50 4515-1 1
50 4520-1 2

4.2.7 Minifigs

Minifigs dataset dimensions
Rows Columns
13764 4
Minifigs dataset basic statistics
fig_num name num_parts img_url
Length:13764 Length:13764 Min. : 0.000 Length:13764
Class :character Class :character 1st Qu.: 4.000 Class :character
Mode :character Mode :character Median : 4.000 Mode :character
Mean : 5.296
3rd Qu.: 5.000
Max. :156.000
Head of Minifigs dataset
fig_num name num_parts img_url
fig-000001 Toy Store Employee 4 https://cdn.rebrickable.com/media/sets/fig-000001.jpg
fig-000002 Customer Kid 4 https://cdn.rebrickable.com/media/sets/fig-000002.jpg
fig-000003 Assassin Droid, White 8 https://cdn.rebrickable.com/media/sets/fig-000003.jpg
fig-000004 Man, White Torso, Black Legs, Brown Hair 4 https://cdn.rebrickable.com/media/sets/fig-000004.jpg
fig-000005 Captain America with Short Legs 3 https://cdn.rebrickable.com/media/sets/fig-000005.jpg
fig-000006 Lloyd Avatar 5 https://cdn.rebrickable.com/media/sets/fig-000006.jpg

4.2.8 Part categories

Part categories dataset dimensions
Rows Columns
66 2
Part categories dataset basic statistics
id name
Min. : 1.00 Length:66
1st Qu.:19.25 Class :character
Median :35.50 Mode :character
Mean :35.36
3rd Qu.:51.75
Max. :68.00
Head of Part categories dataset
id name
1 Baseplates
3 Bricks Sloped
4 Duplo, Quatro and Primo
5 Bricks Special
6 Bricks Wedged
7 Containers

4.2.9 Part relationships

Part relationships dataset dimensions
Rows Columns
29977 3
Part relationships dataset basic statistics
rel_type child_part_num parent_part_num
Length:29977 Length:29977 Length:29977
Class :character Class :character Class :character
Mode :character Mode :character Mode :character
Head of Part relationships dataset
rel_type child_part_num parent_part_num
P 3626cpr3662 3626c
P 87079pr9974 87079
P 3960pr9971 3960
R 98653pr0003 98086pr0003
R 98653pr0003 98088pat0003
R 98653pr0003 98089pat0003

4.2.10 Parts

Parts dataset dimensions
Rows Columns
52615 4
Parts dataset basic statistics
part_num name part_cat_id part_material
Length:52615 Length:52615 Min. : 1.00 Length:52615
Class :character Class :character 1st Qu.:17.00 Class :character
Mode :character Mode :character Median :41.00 Mode :character
Mean :38.91
3rd Qu.:60.00
Max. :68.00
Head of Parts dataset
part_num name part_cat_id part_material
003381 Sticker Sheet for Set 663-1 58 Plastic
003383 Sticker Sheet for Sets 618-1, 628-2 58 Plastic
003402 Sticker Sheet for Sets 310-3, 311-1, 312-3 58 Plastic
003429 Sticker Sheet for Set 1550-1 58 Plastic
003432 Sticker Sheet for Sets 357-1, 355-1, 940-1 58 Plastic
003434 Sticker Sheet for Set 575-2, 653-1, 460-1 58 Plastic

4.2.11 Sets

Sets dataset dimensions
Rows Columns
21880 6
Sets dataset basic statistics
set_num name year theme_id num_parts img_url
Length:21880 Length:21880 Min. :1949 Min. : 1 Min. : 0.0 Length:21880
Class :character Class :character 1st Qu.:2001 1st Qu.:273 1st Qu.: 3.0 Class :character
Mode :character Mode :character Median :2012 Median :497 Median : 31.0 Mode :character
Mean :2008 Mean :442 Mean : 161.4
3rd Qu.:2018 3rd Qu.:608 3rd Qu.: 139.0
Max. :2024 Max. :752 Max. :11695.0
Head of Sets dataset
set_num name year theme_id num_parts img_url
001-1 Gears 1965 1 43 https://cdn.rebrickable.com/media/sets/001-1.jpg
0011-2 Town Mini-Figures 1979 67 12 https://cdn.rebrickable.com/media/sets/0011-2.jpg
0011-3 Castle 2 for 1 Bonus Offer 1987 199 0 https://cdn.rebrickable.com/media/sets/0011-3.jpg
0012-1 Space Mini-Figures 1979 143 12 https://cdn.rebrickable.com/media/sets/0012-1.jpg
0013-1 Space Mini-Figures 1979 143 12 https://cdn.rebrickable.com/media/sets/0013-1.jpg
0014-1 Space Mini-Figures 1979 143 2 https://cdn.rebrickable.com/media/sets/0014-1.jpg

4.2.12 Themes

Themes dataset dimensions
Rows Columns
468 3
Themes dataset basic statistics
id name parent_id
Min. : 1.0 Length:468 Min. : 1.0
1st Qu.:250.5 Class :character 1st Qu.:186.0
Median :466.0 Mode :character Median :411.0
Mean :433.5 Mean :360.6
3rd Qu.:625.2 3rd Qu.:512.5
Max. :752.0 Max. :697.0
NA’s :145
Head of Themes dataset
id name parent_id
1 Technic
3 Competition 1
4 Expert Builder 1
16 RoboRiders 1
17 Speed Slammers 1
18 Star Wars 1

5 Detailed analysis

5.1 Colors

5.1.2 Distribution of colors by transparency

5.2 Elements

5.3 Minifigs

5.4 Themes

5.5 Parts

5.5.2 Most populat parts material

5.6 Sets

5.6.1 Sets with the most parts

Sets with the most parts
Name Image Number of parts
World Map 11695
Eiffel Tower 10001
The Ultimate Battle for Chima 9987
Titanic 9092
Colosseum 9036
Millennium Falcon 7541
AT-AT 6785
The Razor Crest 6194
Lord of the Rings: Rivendell 6182
NINJAGO City Markets 6163

6 Corelation

6.1 Total number of colors used in the sets and the year

Correlation between the total number of colors and the year
Year Number of colors
Year 1.000 0.823
Number of colors 0.823 1.000

6.2 Total number of parts per year and year

Correlation between total number of parts per year and year of production
Year Total number of parts
Year 1.000 0.806
Total number of parts 0.806 1.000

6.3 Total number of sets and year

Correlation between total number of sets and year of production
Year Total number of sets
Year 1.000 0.879
Total number of sets 0.879 1.000

8 Predictions / Forcasting

TODO